Device for selective compression and automatic segmentation of a speech signal



Sept. 8, 1959 c. P. SMITHl 2,903,515

DEVICE FOR sELEcTIvE COMPREssIoN AND AUTOMATIC SEGMENTATION oF A SPEECH SIGNAL Filed Oct. .'51, 1956 2 Sheets-Sheet l EE. J. NZ

/l/'IPL/F/E /l x /3 /4 .snaar/#NG F/TEE f /7 y, efcf/F/EE V 6 0E PLHBZE EELHY .SMOOTH/N I /7 f Sept. 8, 1959 C. P. SMITH DEVICE FOR SELECTIVE COMPRESSION AND AUTOMATIC SEGMENTATION OF A SPEECH SIGNAL Filed Oct. 3l, 1956 2 Sheets-Sheet 2 INVENTOR. CWEL 5M/77# ULL United Sttes Patent O,"

2,903,515 DEVICE FOR SELECTIVE CMPRESSION AND AUTOMATIC SEGMENTATION F A SPEECH SIGNAL (Granted under Title 35, US. Code (1952), sec. 266) The invention described herein may be manufactured and used by or for the United States Government for governmental purposes without payment to me of any royalty thereon.

This invention relates to a device for enhancing the intelligibility of a voice signal by automatically reducing the amplitudes of the vowel sounds, thereby increasing the energy of the weaker consonant sound with respect tothe total sound energy, and to a device 'for automatically segmenting a voice signal into its vowel segments and consonant segments.

yConson'ant speech sounds are highly necessary for voice intelligibility. However, they are much weaker lin intensity than the vowel sounds, and thus are more readily degraded by interferences and noises.

My invention provides a means of increasing relative consonant energy. However, the characteristic energy envelopes of the consonants, i.e. the energy fluctuations in time, are not altered. This is a very important yconsideration, as some of the consonant sounds, for example s asin see and t as in tea have very similar frequency spectra, but differ principally in regard to the rate to which energy builds up and decays in the two sounds. The energy envelopes, or energy patterns in time, must not be altered if clarity and high intelligibility are to be preserved. v Further, many consonant sounds are of very short duration and immediately precede or follow vowel sounds. If the characteristic fluctuations, i.e. rates of build up `and decay of sound energy, are to be preserved, a device fora speech signal must be capable of operating almost instantaneously, with start and release times of the order of to 20 milliseconds, in order to alter the intensity of a vowel yet attain circuit recovery for a following consonant sound, which may occur only 3() to '40 milliseconds after the final vowel oscillation. My invention fulfills these requirements.

In ordinary speech, vowel sounds are characterized by frequency spectra having concentrations of energy in a few well defined regions. These frequencies of energy concentrations are called formants, in a nomenclature wvell established in the science of speech sounds.

The three principal formants of vowel sounds occur mainly in the frequency range from approximately 250 cycles per second to 2500 cycles per second. In some voices these limits are altered, but the above figures have been established by many workers as being typical.

In contrast, consonant sounds are characterized by Afrequency spectra that are much more diffuse, of much lower intensity, and have energy distributions centered about higher frequencies; some consonant sounds have Vtheir energy maximum well above 3000 cycles Vper sec ond.

Froma knowledge of'these characteristics, I providea vdevice that will lproduce electrical signal indicative of whether speech sounds were voiced sounds or unvoiced sounds, providing automatic classication during the speech continuum.

2,903,515' Patented Sept. 8,. 1959 ICC My invention will be more clearly understood from the following detailed description of specific embodiments thereof when read in conjunction with the accom panying drawings wherein:

Figure 1 is a schematic drawing of one embodiment of my invention;

Figure 2 is a circuit diagram of -a suitable variable-gain amplifier usable in the embodiments of Figures 1 and 3; and

Figure 3 is a schematic diagram of another embodiment of my invention.

Similar components in Figures l and 3 are given the same number with primes being used in Figure 3 to distinguish between views.

l My device is presented in simplified form in Figure 1. An 'electrical signal representing a voice signal is applied to the signal input terminal 11`of amplifier 12, which is characterized by being electronically variable as to amplification. For the selective compression function, amplification through amplifier 12 is approximately a linear function ofthe small D C. conti-o1 voltage magnitude at terminal 13, operating at maximum amplification in the absence of a control voltage and being reduced in amplification in proportion to the amplitude of the D C. voltage at terminal 13. For the segmentation function amplifier 12 acts as a switch controlled by the voltage applied to 'terminal 13 and operates with full amplification, or with zero amplification depending on the control voltage. Amplifier 12 may `be 'of a special design shown in the -circuit in Figure 2, or may be any conventional amplifierorimodulator in which amplification of a signal 4vis controllable by means for electronic gain control; for segmentation it may be a simple relay or electronic switch. Such amplifiers, modulators, and switches are well knownin the communications art.

l For selective speech compression, i.e. automatic reduction vofthe vowel amplitudes, a portion of the output signal from amplifier 12 appearing at output terminal 14 is fed back to control the amplification, the feed-back path being through band-pass filter 15, full-wave rectifier 16, and resistance-capacitance smoothing filter 17. A half Wave rectifier could be used, but it is preferred to use a full-wave rectifier.

Band-pass filter 15 is a conventional Wave filter, such ascommonly described in the communications literature. It serves `to pass the frequency spectrum containing the principalvowel formants, containing the principal vowel energy. Frequencies between approximately 250 cycles 'per second and 2500 cycles per second are passed Vby filter 15 and other 'frequencies are excluded. Tests have shown that exact Yfrequency limits are not critical and the upper frequency limit'canbe as low as 1000 cycles per second with satisfactory operation of both the selective compression andthe segmentation functions.

p Output signal for band-pass filter 15 is rectified in fullwave rectifier 16 of conventional design such as are commonly `describe'd'in the communications literature. Thus 'D.C. current proportional to the amount ofvspeech energy 3 at terminal 13 immediately reduces the gain of amplifier 12. As with all feed-back devices, the feed-back tends to reduce the gain to unity. The maximum reduction of rvowel amplitudes will be equal to the quiescent amplification of amplifier 12 in decibels, with no feed-back.

When a consonant signal occurs, its intensity level and frequency spectrum are such that no signal is developed at terminal 13, since the consonant energy is mainly distributed outside the pass-band of filter 15, and the consonant intensity is normally from to 30 decibels lower than the vowel intensity.

In this manner the system automatically reduces the amplitudes of the vowel sounds, while acting as a linear amplifier for the consonant sounds, and consonant sounds occur in the output signal at amplitudes more nearly equal to vowel amplitudes than they would in ordinary speech, thereby enhancing voice clarity and resistance of the speech signal to interference.

Operation of the device as an automatic segmenter of speech sounds is achieved when function switch 18 is switched from terminal 19 to terminal 20 thereby deriving the input signal to the control channel in parallel with the input to amplifier 12 rather than from the output from amplifier 12.

In this mode, the control channel operates in a manner very similar to that previously described. When a vowel sound occurs in the speech signal, part of its energy passes through band-pass filter 15, is rectified by fullwave rectifier 16 and the resulting D.C. current is smoothed in filter 17; the voltage appearing at the output of filter 17 acts to control the gain of amplifier 12. However, there is no longer a feed-back to regulate the gain control, and the self-regulating feature that is typical of feed-back systems no longer limits the amount of gain reduction that can be achieved. If the signal in the control channel is of sufiicient amplitude, enough voltage will be developed to turn off amplifier 12 reducing its gain to zero. When the input signal is adjusted to exceed this minimum amplitude, the device acts as an automatic segmenter. The amplifier acts as a linear device for the consonant sounds, but vowel sounds are automatically segmented out, and do not appear in the output signal since during these intervals the gain of amplifier 12 is reduced to zero.

The basic methods of selective compression and automatic segmentation of speech sounds described in the previous paragraphs allow variations depending on the precise mode of operation desired.

The segmentation process I have described is amplitude-sensitive. I have shown that it is necessary to maintain the input speech signal above a certain threshold level to achieve cut-off of amplifier 12 during all the vowel sounds, thereby achieving positive segmentation. If the speech input signal is below this lower limit segmentation will be incomplete.

In addition to this lower-limit, there is an upper limit imposed on the input signal amplitude as well, due to two facts. The filter may not have perfect rejection outside the pass-band, and therefore a small portion of the consonant signals may pass through. Also, some consonants have a small amount of energy in the frequency band transmitted by the band-pass filter 15. Normally, this energy is too small to affect operation of the control channel. However, if the input speech signal amplitude is made very large, consonant signals Will cause enough energy in the control operation to cause erroneous operation of the device. A normalizing technique removes this limitation and this is achieved by comparing tho signal amplitude in the frequency band transmitted by band-pass filter 15 with the total voice signal amplitude, by a circuit technique illustrated in Figure 3. Electronic comparison in this manner results in a voltage signal that will indicate whether or not a given speech sound is a vowel sound, that is independent of the amplitude of the input speech signal over very wide limits.

Thus the segmenter operates with the modified circuitry without the necessity of maintaining the input signal amplitude within close limits.

Referring to Figure 3, a portion of the input signal is rectified in conventional full-wave rectifier 21, and the resulting D.C. circuit is smoothed in conventional resistance-capacitance filter 22. Full-wave rectifier 21 is so connected that the polarity of the D.C. output signal is opposite in sign to that of full-wave rectifier 16'. Operation of filter 15', rectifier 16', and filter 17 is as previously described for the corresponding components of Figure 1. Resistors 23 and 24 add together voltages from the` two channels. The resulting signal at terminal 13 will have a polarity which will be determined by whether the input signal is a vowel or consonant, and the polarity is not affected by changes in the input signal amplitude, only by the relative distribution of the energy of the spectrum of the input signal. Amplifier 12' acts as a switching device as before, being switched on for a control signal of one polarity, and being switched off for a control signal of opposite polarity, thus achieving automatic segmentation of the speech signal. Alternately, high-speed polarized relay can replace amplifier 12' to achieve the same function.

I have described how the operation of the device can be made insensitive to the amplitude of the input signal. Another variation in the mode of operation of my device makes it possible to vary the selective limits of the segmenter, depending on whether it is desired to segment nasal consonants, voiced stop consonants, and voiced fricative consonant, with the vowel segments, or with the consonant segments. Alternatively, these categories of speech sounds can be separately segmented so that only these sounds are included in the output signal, or only these sounds are automatically excluded from the output signal.

The various modes of segmentation are achieved by varying the frequency band limits of the control channel to achieve the desired result. The voiced speech sounds are characterized by strong low-frequency energy components, in the frequency range of approximately to 250 cycles per second. Lowering the low-frequency limit of `band-pass filter l5 to 80 cycles per second results in all of the voiced sounds being automatically included in the category of the vowel sounds. Excluding this band 80 to 250 cycles per second will result in them being included in the consonant category. Taking the ratio of energy in the 80 to 250 cycles per second band, and the 25() to 2500 cycles per second band permits distinction between the sounds that are vowels, and the sounds that are voiced but non-vowel, i.e. voiced fricatives, etc. In each case the speech spectra of the different speech sounds is the selection criterion for the segmenter.

Whether the segmenter transmits vowel sounds only, or the consonant sounds only, is established by the quiescent condition of amplifier 12. Operation is summarized as follows:

Mode of Operation of Segmenter Signal at Amplifier 12 Terminal 13 (Quiescent) Vowels switched ofl." Only consonant Disabling-.- on.

segments at output. Vowels switched on. Only vowel seg- Enabl1ng..- ofi ments at output.

Other more conventional variable-gain amplifiers will function equally well. Operation of the amplifier shown in Figure 2 is as follows: The input speech signal is applied through conventional input transformer 42 to the grids of triode tubes 30 and 31 in push-pull. The grids of triode tubes 32 and 33 also in push-pull are connected to the cathodes of tubes 30 and 31 through bias batteries 43 and 44, respectively. The cathode of triode tube 32 is cross-connected to the cathode of triode tube 31, and the cathode of triode tube 33 is cross-connected to the cathode of triode tube 30. Cathode resistors 34 and 35 are connected to negative terminals of a D.C. power supply source 38. The plates of triode tubes 30 and 31 which act as the signal triodes, are connected to conventional push-pull output transformer 36, whose center tap is connected to the positive terminal of a D.C. power supply source 37.

Triodes 32 and 33 act in shunt with cathode loads of triodes 30 and 31. The plate current for shunt triodes 32 and 33 is derived from positive terminal of D.C. power supply source 37 in series with triode 39. The grids of triodes 30 and 31 are returned to source 40 of D.C. bias through the mid-point connection of the secondary winding of input transformer 42.

The circuit operates in the following manner: In quiescent condition, the anode currents, and hence the cathode currents, of push-pull triodes 30 and 31 and shunt triodes 32 and 33 are equal. In this quiescent state, triodes 30 and 31 operate as a normal push-pull amplifier, and input signal applied to transformer 42 is amplified and appears as an output signal at transformer 36.

The series triode 39 regulates the current flowing through shunt triodes 32 and 33. When a control signal is applied to terminal 41, causing triode 39 to conduct more heavily, shunt triodes 32 and 33 also draw more current, raising the cathode voltage. Since the cathode resistors 34 and 35 are common with signal triodes 30 and 31, increased bias on triodes 30 and 31 results, as well as increased negative feed-back, and the amplication of triodes 30 and 31 is thereby reduced, and made to vary linearly with control voltage applied to terminal 41. If the grid of triode 39 is driven sufficiently positive, cutoff bias is developed across cathode resistors 34 and 35 due to the large amount of current drawn by triodes 32 and 33. If the voltage applied to the grid of triode 39 is switched from the quiescent condition to cut-off voltage or greater, the amplifier acts as a high-speed switching device.

Although the invention has been described in terms of specified apparatus which is set forth in considerable detail it should be understood that this is by way of illustration only and that the invention is not necessarily limited thereto, since all embodiments and operating techniques will become apparent to those skilled in the art in View of the disclosures. For example, the gain of the amplifier could be electrically-mechanically controlled or remotely controlled in other similar ways than shown in the specific embodiment above, however, the method shown above is perhaps the simplest. Accordingly, modifications are contemplated which can be made without departing from the spirit or the scope of the appended claims.

What is claimed is:

l. A device which is not amplitude sensitive for automatic segmentation of a speech signal comprising a variable-gain amplifier, means designed to provide a control voltage proportional to the amplitude of a limited band of frequencies within the normal range of speech frequencies connected to the input and the gain control of said amplifier, and means designed to provide a control voltage proportional to the amplitude of frequencies within the normal range of speech frequencies connected to the input and the gain control of said amplifier, the polarity of the voltage applied to the gain control of said amplier by said first means being opposite to that applied by said second means providing for the operation of the amplifier as an electronic switch.

2. A device which is not amplitude sensitive for automatic segmentation of a speech signal comprising a highspeed polarized relay, means designed to provide a control voltage proportional to the amplitude of a limited band of frequencies within the normal range of speech frequencies connected to the input to said relay, and means designed to provide a control voltage proportional to the amplitude of frequencies within the normal range of speech frequencies connected to the input to said relay, the polarity of the voltage applied to control said relay by said first means being opposite to that applied by said second means.

3. A device which is not amplitude sensitive for the automatic segmentation of a speech signal comprising a variableagain amplifier having lan electronically-variable gain control; a rectifier connected to the input of said amplifier, a resistance-capacitance smoothing filter connected to the output of said rectifier, and the output of said smoothing filter connected through a resistor to the gain control of said amplifier; a band-pass filter designed to pass a limited band of frequencies within the normal range of speech frequencies connected to the input of said amplifier, a second rectifier connected to the output of said band-pass filter, a sceond resistance-capacitance smoothing filter connected to the output of said rectifier, and the output of said second smoothing lter connected through a second resistor to the gain control of said amplifier for the purpose of regulating the gain of the amplifier, the polarity of the voltage applied to the gain control of said amplifier from said first smoothing filter being opposite to tha-t applied by said second smoothing filter providing for the operation of the amplifier as an electronic switch.

4. The device of claim 3 wherein a shunt diode is connected to the gain control to limit the voltage applied to the gain control and speed up the recovery time.

5. The device of claim 3 wherein said band-pass filter is designed to pass frequencies within the range of about 250 cycles per second as a lower limit and not more than about 2500 but not less than about 1000 cycles per second as an upper limit.

6. A device which is not amplitude sensitive for the automatic segmentation of a -speech signal comprising a high-speed polarized relay; a rectifier connected to the input to said relay, a resistance-capacitance smoothing filter connected to the output of said rectifier, and the output of said smoothing filter connected through a resistor to control said relay; a band-pass lter designed to pass a limited band of frequencies within the normal range o-f speech frequencies connected to the input to said relay, a second rectifier connected to the output of said band-pass filter, a second resistance-capacitance smoothing filter connected to the output of said second rectifier, and the output of said second smoothing filter connected through a second resistor to control said relay, the polarity of the voltage applied to control said relay by said rst smoothing filter being opposite to that applied by said second smoothing filter.

References Cited in the file of this patent UNITED STATES PATENTS 2,312,260 Miller Feb. 23, 1943 2,799,734 Camp July 16, 1957 

